Generative AI
Debiasing Synthetic Data Generated by Deep Generative Models Ghent University Hospital - SYNDARA Ghent University Hospital - SYNDARA Paloma Rabaey
While synthetic data hold great promise for privacy protection, their statistical analysis poses significant challenges that necessitate innovative solutions. The use of deep generative models (DGMs) for synthetic data generation is known to induce considerable bias and imprecision into synthetic data analyses, compromising their inferential utility as opposed to original data analyses. This bias and uncertainty can be substantial enough to impede statistical convergence rates, even in seemingly straightforward analyses like mean calculation.
How ChatGPT could replace the internet as we know it
As I watched the internet debate over who would win between 100 men and 1 gorilla (yes, this is an actual debate online), I couldn't help but become invested in the outcome. I laughed at the ridiculous comments and marveled at the stunning AI videos. Then something hit me like a ton of bricks, or in this case, like a gorilla. "Self, OpenAI will probably be the biggest company we have ever seen." I know, I know -- that's a crazy thing to say, and it's incredibly random.
A Geometric View of Data Complexity: Efficient Local Intrinsic Dimension Estimation with Diffusion Models
High-dimensional data commonly lies on low-dimensional submanifolds, and estimating the local intrinsic dimension (LID) of a datum - i.e. the dimension of the submanifold it belongs to - is a longstanding problem. LID can be understood as the number of local factors of variation: the more factors of variation a datum has, the more complex it tends to be. Estimating this quantity has proven useful in contexts ranging from generalization in neural networks to detection of out-of-distribution data, adversarial examples, and AI-generated text. The recent successes of deep generative models present an opportunity to leverage them for LID estimation, but current methods based on generative models produce inaccurate estimates, require more than a single pre-trained model, are computationally intensive, or do not exploit the best available deep generative models: diffusion models (DMs). In this work, we show that the Fokker-Planck equation associated with a DM can provide an LID estimator which addresses the aforementioned deficiencies.
Barking up the right tree: an approach to search over molecule synthesis DAGs
When designing new molecules with particular properties, it is not only important what to make but crucially how to make it. These instructions form a synthesis directed acyclic graph (DAG), describing how a large vocabulary of simple building blocks can be recursively combined through chemical reactions to create more complicated molecules of interest. In contrast, many current deep generative models for molecules ignore synthesizability. We therefore propose a deep generative model that better represents the real world process, by directly outputting molecule synthesis DAGs. We argue that this provides sensible inductive biases, ensuring that our model searches over the same chemical space that chemists would also have access to, as well as interpretability. We show that our approach is able to model chemical space well, producing a wide range of diverse molecules, and allows for unconstrained optimization of an inherently constrained problem: maximize certain chemical properties such that discovered molecules are synthesizable.
Estimating the Hallucination Rate of Generative AI Andrew Jesson Nicolas Beltran-Velez * Quentin Chu
This paper presents a method for estimating the hallucination rate for in-context learning (ICL) with generative AI. In ICL, a conditional generative model (CGM) is prompted with a dataset and a prediction question and asked to generate a response. One interpretation of ICL assumes that the CGM computes the posterior predictive of an unknown Bayesian model, which implicitly defines a joint distribution over observable datasets and latent mechanisms. This joint distribution factorizes into two components: the model prior over mechanisms and the model likelihood of datasets given a mechanism. With this perspective, we define a hallucination as a generated response to the prediction question with low model likelihood given the mechanism. We develop a new method that takes an ICL problem and estimates the probability that a CGM will generate a hallucination. Our method only requires generating prediction questions and responses from the CGM and evaluating its response log probability. We empirically evaluate our method using large language models for synthetic regression and natural language ICL tasks.
The OpenAI empire - podcast
In 2019, before most of the world had heard of the company, the technology journalist Karen Hao spent three days embedded in the offices of OpenAI. What she saw, she tells Michael Safi, was a company vastly at odds with its public image: that of a transparent non-profit developing artificial intelligence technology purely for the benefit of humanity. "They said that they were transparent. They said that they were collaborative. They were actually very secretive."
Hints-In-Browser: Benchmarking Language Models for Programming Feedback Generation
Generative AI and large language models hold great promise in enhancing programming education by generating individualized feedback and hints for learners. Recent works have primarily focused on improving the quality of generated feedback to achieve human tutors' quality. While quality is an important performance criterion, it is not the only criterion to optimize for real-world educational deployments.
Hints-In-Browser: Benchmarking Language Models for Programming Feedback Generation
Generative AI and large language models hold great promise in enhancing programming education by generating individualized feedback and hints for learners. Recent works have primarily focused on improving the quality of generated feedback to achieve human tutors' quality. While quality is an important performance criterion, it is not the only criterion to optimize for real-world educational deployments.
Sam Altman and Jony Ive Will Force A.I. Into Your Life
Ive led the designs of the original iMac, the iPad, and the Apple Watch, among other era-defining products. Then, in 2019, he left Apple to start his own design firm called LoveFrom. The news of his move to OpenAI felt something like learning that LeBron James was joining the Miami Heat: Ive had become synonymous with Apple's success, perhaps second only to Jobs. Now, after a period of independence, he was choosing a new team. The announcement of the deal with OpenAI--for a reported 6.5 billion in OpenAI equity--came via a press release, featuring a rather cuddly portrait of Ive with OpenAI's C.E.O. and co-founder, Sam Altman (shot by the British fashion photographer Craig McDean) and a faux-casual videotaped interview session between the two at San Francisco's Cafe Zoetrope. In it, Altman describes "a family of devices that would let people use A.I. to create all sorts of wonderful things," enabled by "magic intelligence in the cloud."